Skip to content

Conversation

ChristoGrab
Copy link
Collaborator

@ChristoGrab ChristoGrab commented Sep 25, 2025

What

Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/14540

Loom Demo: https://www.loom.com/share/3f063d020c774861ab0c40a10c429b11

Adds basic try/catch error handling to the manifest-server calls to ensure errors during connector builder calls are surfaced in the builder UI.

Currently, customers are experiencing issues with errors during test reads being swallowed and thrown as generic 500-level errors.

493026242-585d16bd-1521-46de-837a-84e193c0a2b5

How to Test

For quick backend tests, use Postman against a locally deployed instance of the manifest-server. This was the test manifest I used in the request body to replicate the error encountered in the Swarm ticket.

On main, this will cause an unhandled exception to throw the 500 server error.

Screenshot 2025-09-25 at 2 49 42 PM

With the changes, the actual error is caught and the message displayed in the details.

Screenshot 2025-09-25 at 2 48 52 PM

To Test in Platform

Steps to run manifest-server locally and connect to local OSS Airbyte instance:

https://gist.github.com/ChristoGrab/1abaaacc0d1879d8642c946336feb5a9

Summary by CodeRabbit

  • New Features
    • Standardized error responses across manifest endpoints with a single-field "detail" payload and secret filtering applied to returned messages.
  • Bug Fixes
    • Converted previously unhandled exceptions into consistent HTTP 400 responses while preserving existing HTTPException semantics.
  • Documentation
    • OpenAPI bumped to v0.2.0 and documents new 400 responses and the ErrorResponse schema.
  • Tests
    • Added tests validating error handling across manifest endpoints.
  • Chores
    • Application version bumped to 0.2.0.

@github-actions github-actions bot added bug Something isn't working security labels Sep 25, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@christo/manifest-server-errors#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch christo/manifest-server-errors

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link

github-actions bot commented Sep 25, 2025

PyTest Results (Fast)

3 801 tests  +5   3 789 ✅ +5   6m 30s ⏱️ +4s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 77f612e. ± Comparison against base commit f360d91.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Sep 25, 2025

PyTest Results (Full)

3 804 tests  +5   3 792 ✅ +5   11m 3s ⏱️ +3s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 77f612e. ± Comparison against base commit f360d91.

♻️ This comment has been updated with latest results.

@ChristoGrab
Copy link
Collaborator Author

ChristoGrab commented Sep 25, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@ChristoGrab ChristoGrab marked this pull request as ready for review September 25, 2025 21:30
@Copilot Copilot AI review requested due to automatic review settings September 25, 2025 21:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds basic error handling to the manifest-server calls to improve error reporting in the connector builder UI. Previously, errors during operations like test reads were being swallowed and returned as generic 500-level errors, making debugging difficult for customers.

Key Changes

  • Added try/catch blocks around all manifest-server endpoint operations (test_read, check, discover, resolve, full_resolve)
  • Standardized error responses to return HTTP 400 with descriptive error messages using AirbyteTracedException
  • Added comprehensive test coverage for error handling scenarios

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
airbyte_cdk/manifest_server/routers/manifest.py Added error handling try/catch blocks to all endpoints with descriptive error messages
airbyte_cdk/manifest_server/api_models/manifest.py Added ErrorResponse model for standardized error responses
airbyte_cdk/manifest_server/api_models/init.py Exported the new ErrorResponse model
airbyte_cdk/manifest_server/openapi.yaml Updated OpenAPI spec to include 400 error responses and version bump
airbyte_cdk/manifest_server/app.py Updated application version to 0.2.0
unit_tests/manifest_server/routers/test_manifest.py Added comprehensive test coverage for error handling scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@airbytehq airbytehq deleted a comment from coderabbitai bot Sep 25, 2025
Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new ErrorResponse Pydantic model and exposes it via api_models; bumps FastAPI/OpenAPI version to 0.2.0; introduces 400 Bad Request responses and try/except sanitization in manifest router endpoints; and adds unit tests asserting error translation to ErrorResponse.

Changes

Cohort / File(s) Summary
API models
airbyte_cdk/manifest_server/api_models/__init__.py, airbyte_cdk/manifest_server/api_models/manifest.py
Added ErrorResponse Pydantic model (detail: str) and exported it via __all__.
Router error handling
airbyte_cdk/manifest_server/routers/manifest.py
Wrapped endpoint logic (test_read, check, discover, resolve, full_resolve) in try/except, sanitizing errors with filter_secrets and returning HTTP 400 with ErrorResponse.detail; added 400 response annotations and imports.
OpenAPI spec
airbyte_cdk/manifest_server/openapi.yaml
Bumped API spec to 0.2.0; added ErrorResponse schema and new 400 responses referencing it for manifest endpoints.
App version
airbyte_cdk/manifest_server/app.py
Updated FastAPI app version from 0.1.0 to 0.2.0.
Unit tests
unit_tests/manifest_server/routers/test_manifest.py
Added tests mocking failures to assert HTTP 400 responses containing sanitized ErrorResponse.detail for each manifest endpoint.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Router as FastAPI Router
  participant Processor as ManifestCommandProcessor / build_source
  participant Sanitizer as filter_secrets
  participant ErrorResp as ErrorResponse

  Client->>Router: POST /v1/manifest/* (payload)
  Router->>Processor: execute command / build source
  alt success
    Processor-->>Router: result
    Router-->>Client: 200 OK (normal payload)
  else exception
    Processor-->>Router: raises Exception
    Router->>Sanitizer: sanitize exception message
    Sanitizer-->>Router: sanitized message
    Router->>ErrorResp: construct {detail: sanitized message}
    Router-->>Client: 400 Bad Request (ErrorResponse)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • aaronsteers
  • brianjlai

Would you like me to list the exact changed code snippets for each endpoint to speed review, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately summarizes the main change in the pull request by indicating that basic error handling is being added to the manifest-server calls, which aligns directly with the implemented try/catch enhancements and new ErrorResponse handling throughout the API.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch christo/manifest-server-errors

Comment @coderabbitai help to get the list of available commands and usage tips.

@girarda
Copy link
Contributor

girarda commented Oct 9, 2025

Hi team, can we get a code review so we can ship this? This is preventing us from iterating on builder projects because errors are not exposed to the user

Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on Christo! And apologies it's taken so long to review.

Overall makes sense - left one question about whether the wrapping in AirbyteTracedException is necessary. I think we should remove it if not necessary. I'll let you decide and merge when ready.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte_cdk/manifest_server/routers/manifest.py (1)

186-204: Bare re-raise is fine; add intent comment?

Keeping parity with earlier feedback, would you add a brief comment clarifying the intentional bare raise? wdyt?

-    except HTTPException:
-        # Re-raise HTTPExceptions as-is (like the catalog None check above)
-        raise
+    except HTTPException:
+        # Re-raise HTTPExceptions as-is (like the catalog None check above)
+        # Intentional bare raise: preserve original status and message
+        raise
🧹 Nitpick comments (7)
unit_tests/manifest_server/routers/test_manifest.py (1)

539-657: Add a sanitization test by patching filter_secrets?

To ensure secret scrubbing is exercised, could we patch the router’s filter_secrets to verify it’s invoked and controls the response? For example:

+@patch("airbyte_cdk.manifest_server.routers.manifest.filter_secrets", return_value="<<sanitized>>")
+@patch("airbyte_cdk.manifest_server.routers.manifest.ManifestCommandProcessor")
+@patch("airbyte_cdk.manifest_server.routers.manifest.build_source")
+def test_check_sanitization(
+    self, mock_build_source, mock_runner_class, mock_filter, sample_manifest, sample_config, mock_source
+):
+    mock_build_source.return_value = mock_source
+    mock_runner = Mock()
+    mock_runner.check_connection.side_effect = RuntimeError("secret-token-123")
+    mock_runner_class.return_value = mock_runner
+    response = client.post("/v1/manifest/check", json={"manifest": sample_manifest, "config": sample_config})
+    assert response.status_code == 400
+    assert response.json()["detail"] == "<<sanitized>>"

This validates the sanitization pathway without coupling to secret registration internals, wdyt?

airbyte_cdk/manifest_server/routers/manifest.py (6)

70-76: Document 422 validation responses too?

Since Pydantic validation returns 422s, should we add 422 in responses for completeness (applies to all endpoints)? wdyt?

 @router.post(
     "/test_read",
     operation_id="testRead",
     responses={
-        400: {"description": "Bad Request - Error processing request", "model": ErrorResponse}
+        400: {"description": "Bad Request - Error processing request", "model": ErrorResponse},
+        422: {"description": "Validation error"},
     },
 )

121-137: Consider logging and/or simplifying wrapping.

We create AirbyteTracedException but only use its message; would it be simpler to filter_secrets(str(exc)) directly, or alternatively log the full traced error (logger.exception or error.emit_message) for observability? wdyt?

-    except Exception as exc:
-        error = AirbyteTracedException.from_exception(
-            exc, message=f"Error reading stream: {str(exc)}"
-        )
-        # Filter secrets from error message before returning to client
-        sanitized_message = filter_secrets(error.message or str(exc))
-        raise HTTPException(status_code=400, detail=sanitized_message)
+    except Exception as exc:
+        # Optional: logger.exception("test_read failed")  # keep server-side trace
+        sanitized_message = filter_secrets(f"Error reading stream: {str(exc)}")
+        raise HTTPException(status_code=400, detail=sanitized_message)

If you prefer the traced exception for future use, maybe emit/log it instead, wdyt?


156-167: Preserve existing HTTPExceptions (consistency with discover).

To keep explicit HTTP statuses/messages from safe_build_source (or future checks), shall we re-raise HTTPException before the generic catch (like discover does)? wdyt?

 def check(request: CheckRequest) -> CheckResponse:
@@
-    try:
+    try:
         source = safe_build_source(request.manifest.model_dump(), request.config.model_dump())
         runner = ManifestCommandProcessor(source)
         success, message = runner.check_connection(request.config.model_dump())
         return CheckResponse(success=success, message=message)
+    except HTTPException:
+        # Preserve explicit HTTP errors raised upstream
+        raise
     except Exception as exc:
-        error = AirbyteTracedException.from_exception(
-            exc, message=f"Error checking connection: {str(exc)}"
-        )
-        # Filter secrets from error message before returning to client
-        sanitized_message = filter_secrets(error.message or str(exc))
+        sanitized_message = filter_secrets(f"Error checking connection: {str(exc)}")
         raise HTTPException(status_code=400, detail=sanitized_message)

223-232: Resolve: preserve HTTPExceptions like discover/check.

For consistency and to avoid masking explicit statuses, shall we re-raise HTTPException here too? wdyt?

 def resolve(request: ResolveRequest) -> ManifestResponse:
@@
-    try:
+    try:
         source = safe_build_source(request.manifest.model_dump(), {})
         return ManifestResponse(manifest=Manifest(**source.resolved_manifest))
+    except HTTPException:
+        raise
     except Exception as exc:
-        error = AirbyteTracedException.from_exception(
-            exc, message=f"Error resolving manifest: {str(exc)}"
-        )
-        # Filter secrets from error message before returning to client
-        sanitized_message = filter_secrets(error.message or str(exc))
+        sanitized_message = filter_secrets(f"Error resolving manifest: {str(exc)}")
         raise HTTPException(status_code=400, detail=sanitized_message)

255-280: Avoid mutating source.resolved_manifest; preserve HTTPExceptions; tiny wording nit.

full_resolve currently shallow-copies manifest and then mutates streams in-place, which can side-effect source.resolved_manifest across requests. Shall we deep copy before mutation, and also re-raise HTTPException for consistency? Also, consider “fully resolving” wording (optional) to read better. wdyt?

+import copy
@@
-        source = safe_build_source(request.manifest.model_dump(), request.config.model_dump())
-        manifest = {**source.resolved_manifest}
+        source = safe_build_source(request.manifest.model_dump(), request.config.model_dump())
+        manifest = copy.deepcopy(source.resolved_manifest)
         streams = manifest.get("streams", [])
         for stream in streams:
             stream["dynamic_stream_name"] = None
@@
-        return ManifestResponse(manifest=Manifest(**manifest))
+        return ManifestResponse(manifest=Manifest(**manifest))
-    except Exception as exc:
-        error = AirbyteTracedException.from_exception(
-            exc, message=f"Error full resolving manifest: {str(exc)}"
-        )
-        # Filter secrets from error message before returning to client
-        sanitized_message = filter_secrets(error.message or str(exc))
+    except HTTPException:
+        raise
+    except Exception as exc:
+        # Optional: change to "fully resolving" if you want improved grammar
+        sanitized_message = filter_secrets(f"Error full resolving manifest: {str(exc)}")
         raise HTTPException(status_code=400, detail=sanitized_message)

This prevents cross-request mutation and keeps error handling consistent, wdyt?


17-26: Minor: centralize exception-to-HTTP conversion?

The try/except + sanitize + raise blocks repeat. Would extracting a small helper (e.g., raise_sanitized_http_400(context, exc)) reduce duplication and keep messages consistent across endpoints? wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9df9c67 and 1c95c99.

📒 Files selected for processing (2)
  • airbyte_cdk/manifest_server/routers/manifest.py (7 hunks)
  • unit_tests/manifest_server/routers/test_manifest.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
unit_tests/manifest_server/routers/test_manifest.py (3)
unit_tests/manifest_server/command_processor/test_processor.py (2)
  • sample_config (45-47)
  • mock_source (35-37)
airbyte_cdk/manifest_server/routers/manifest.py (2)
  • test_read (77-137)
  • discover (177-204)
airbyte_cdk/manifest_server/command_processor/processor.py (3)
  • test_read (30-57)
  • check_connection (59-84)
  • discover (86-107)
airbyte_cdk/manifest_server/routers/manifest.py (4)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • filter_secrets (73-80)
airbyte_cdk/utils/traced_exception.py (2)
  • AirbyteTracedException (25-145)
  • from_exception (103-121)
airbyte_cdk/manifest_server/api_models/manifest.py (1)
  • ErrorResponse (88-91)
airbyte_cdk/manifest_server/command_processor/processor.py (4)
  • test_read (30-57)
  • ManifestCommandProcessor (23-119)
  • check_connection (59-84)
  • discover (86-107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (1)
unit_tests/manifest_server/routers/test_manifest.py (1)

539-657: Great coverage for CDK error translation to 400.

Assertions validate detail shape and meaningful prefixes. Nice.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/manifest_server/routers/manifest.py (1)

109-133: Consider moving safe_build_source inside the try block for consistency.

In test_read, safe_build_source is called at line 109 (before the try block starts at line 120), while in the other endpoints (check, discover, resolve, full_resolve) it's called inside the try block. This means validation errors from safe_build_source will propagate with the original "Invalid manifest" message in test_read, but will be caught and prefixed with "Error checking connection:", etc. in other endpoints.

Is this intentional to preserve the original validation error message in test_read, or should it be moved inside the try block for consistency, wdyt?

Additionally, consider logging the original exception before sanitizing and re-raising to aid debugging:

     runner = ManifestCommandProcessor(source)
     try:
         cdk_result = runner.test_read(
             config_dict,
             catalog,
             converted_state,
             request.record_limit,
             request.page_limit,
             request.slice_limit,
         )
         return StreamReadResponse.model_validate(asdict(cdk_result))
     except Exception as exc:
+        # Log the original exception for debugging
+        import logging
+        logging.getLogger("airbyte.manifest-server").exception("Error in test_read")
         # Filter secrets from error message before returning to client
         sanitized_message = filter_secrets(f"Error reading stream: {str(exc)}")
         raise HTTPException(status_code=400, detail=sanitized_message)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c95c99 and 77f612e.

📒 Files selected for processing (1)
  • airbyte_cdk/manifest_server/routers/manifest.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/manifest_server/routers/manifest.py (5)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • filter_secrets (73-80)
airbyte_cdk/manifest_server/api_models/manifest.py (1)
  • ErrorResponse (88-91)
airbyte_cdk/manifest_server/command_processor/processor.py (4)
  • test_read (30-57)
  • ManifestCommandProcessor (23-119)
  • check_connection (59-84)
  • discover (86-107)
airbyte_cdk/manifest_server/api_models/stream.py (1)
  • StreamReadResponse (67-76)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (4)
  • discover (382-385)
  • resolved_manifest (350-360)
  • streams (388-420)
  • dynamic_streams (510-514)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
🔇 Additional comments (8)
airbyte_cdk/manifest_server/routers/manifest.py (8)

17-17: LGTM!

The filter_secrets import is properly used throughout the file to sanitize error messages before returning them to clients.


24-24: LGTM!

The ErrorResponse model is correctly integrated into the API responses for all endpoints.


69-75: LGTM!

The OpenAPI documentation correctly describes the 400 error response with the ErrorResponse model.


136-161: LGTM!

The try/except block correctly wraps the entire connection check logic and sanitizes error messages before returning them to clients.


188-190: LGTM!

The HTTPException re-raise pattern correctly preserves the original status code (e.g., 422 for missing catalog) while catching and sanitizing other exceptions as 400s. This is the right approach for distinguishing between different error types.


163-195: Good pattern for preserving HTTP status codes.

The discover endpoint correctly re-raises HTTPExceptions (like the 422 for missing catalog at lines 184-186) while catching and sanitizing other exceptions as 400s. This ensures that specific HTTP status codes are preserved where appropriate.


197-220: LGTM!

The error handling follows the established pattern consistently.


222-264: LGTM!

The try/except block correctly handles errors during manifest resolution and dynamic stream generation, with proper secret filtering.

@ChristoGrab ChristoGrab merged commit 8423ee4 into main Oct 15, 2025
28 of 29 checks passed
@ChristoGrab ChristoGrab deleted the christo/manifest-server-errors branch October 15, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants